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Abstract 

We show that the problem of constructing tree-structured descriptions of data layouts 
that are optimal with respect to space or other criteria, from given sequences of displace¬ 
ments, can be solved in polynomial time. The problem is relevant for efficient compiler and 
library support for communication of non-contiguous data, where tree-structured descrip¬ 
tions with low-degree nodes and small index arrays are beneficial for the communication 
soft- and hardware. An important example is the Message-Passing Interface (MPI) which 
has a mechanism for describing arbitrary data layouts as trees using a set of increasingly 
general constructors. Our algorithm shows that the so-called MPI datatype reconstruction 
problem by trees with the full set of MPI constructors can be solved optimally in polynomial 
time, refuting previous conjectures that the problem is NP-hard. Our algorithm can handle 
further, natural constructors, currently not found in MPI. 

Our algorithm is based on dynamic programming, and requires the solution of a series 
of shortest path problems on an incrementally built, directed, acyclic graph. The algorithm 
runs in O(n^) time steps and requires 0{n^) space for input displacement sequences of length 
n. 


1 Introduction 

It is a common situation for instance in parallel, numerical libraries that substructures of large, 
static data structures have to be communicated among processors m, e.g., row- or column vec¬ 
tors or sub-matrices of multi-dimensional matrices, or irregular substructures corresponding to 
the non-zeros or other special elements of larger structures. This requires efficient access to the 
typically non-contiguously stored substructure elements in some predefined order, either for the 
application which “(un)packs” the elements (from) to some structured communication buffer, or 
for the communication soft- or hardware to handle the non-consecutive communication in a way 

‘This work was co-funded by the European Commission through the EPiGRAM project (grant agreement no. 
610598 ). 
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that is transparent to the application. For the latter approach, concise and efficient descriptions 
of such substructures are needed. For instance, lists of element addresses or displacements are 
neither concise (space proportional to the number of elements is required) nor efficient (process¬ 
ing time is at least doubled, since also the list has to be traversed). For substructures with some 
regularities, much better representations are obviously possible. Often, tree representations are 
used with leaves describing base-types and interior constructor nodes how subtrees are repeated. 
For example, complex data types in C-like languages can be built recursively using a small num¬ 
ber of constructors (like arrays and structs) from given primitive types (ints, chars, doubles, 
etc.), and the resulting type trees describe to the compiler how data are laid out in memory. 
The same kind of mechanism could be used to describe substructures of such data types (but is 
not a part of C). The Message-Passing Interface (MPI) [8] is an important example of a paral¬ 
lel communication interface, indeed often used to implement parallel numerical libraries mi, 
which provides a generic, explicit mechanism for describing non-consecutive application data to 
allow the library implementation to perform non-consecutive communication in an efficient way, 
possibly by directly exploiting hardware features for, e.g., strided, non-consecutive communi¬ 
cation. Given such a tree-structured description of an application data layout, it is a natural 
question to ask whether this description is optimal under some given cost model reflecting the 
cost of storing or processing the description. Likewise, given a trivial description of a data 
layout in the form of a long list of addresses (or offsets, or displacements), it is natural to 
ask for an algorithm for constructing an efficient, that is, cost-optimal representation as a tree 
with some given set of constructors. In the MPI community, the former problem is referred 
to as type normalization, and the latter as type reconstruction [3]. Both problems are eventu¬ 
ally important for the implementation of very high-quality MPI libraries. The problems would 
be similarly important in other parallel interfaces or languages supporting communication of 
arbitrarily structured, non-consecutive data. Ideally, a compiler would be able to perform the 
normalization (optimization) of data layout descriptions given more or less explicitly by the 
application programmer in the code with the constructs available in the parallel language [12j . 

In this paper, we investigate primarily the type reconstruction problem for a given set of 
constructors, that is, the problem of finding the most concise tree representation of a given 
substructure specihed by an explicit list of displacements. As the set of constructors, we use 
a convenient abstraction of the type constructors found in MPI [8l Chapter 4]. This is both a 
natural and powerful set that includes constructors for the case where a single substructure is 
repeated in a regular or irregular pattern as well as the case where different substructures are 
concatenated with given displacements. Our main result is to show that an optimally concise 
tree representation can be found in polynomial time for the whole set of constructors, and thus 
as a corollary that both type reconstruction and type normalization for the whole set of MPI 
derived data type constructors can be solved in polynomial time. This is an interesting result 
since the computational hardness of the problem was not known before. Indeed, the problem 
was believed not to be in P by parts of the MPI community. Specifically, we give an algorithm 
that hnds an optimal type tree description for a sequence of displacements of length n in O(n^) 
operations. The algorithm is based on a non-trivial use of dynamic programming requiring 
the solution of a single-source shortest path problem for each new subproblem solution. Using 
standard dynamic programming techniques, the space requirement is O(n^). 

MPI libraries typically employ simple forms of type normalization to derived data types set 
up by the application programmer (this is folklore, but see [51ISIITIJ] for explicit descriptions). In 
recent papers ms], the problem was more systematically analyzed, and it was shown that when 
restricted to certain homogeneous constructors (those having a single child) the reconstruction 
and normalization problems can be solved quite efficiently in low, polynomial time. It was 
explicitly conjectured that the problems with the full set of MPI derived data type constructors 
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would be NP-hard [an. We stress that when it is allowed to fold the constructed trees into 
even more concise, directed acyclic graphs (DAGs), the optimality of our construction is no 
longer guaranteed. We discuss this problem at the end of the paper. 

The notion of an optimal tree-like representation of a data layout is of course relative to 
the way the tree will be used and processed by the parallel programming language or library 
implementation. Processing typically includes the ability to pack and unpack parts of the layout 
independently using hardware support for blocked, strided memory access and similar features 
of the communication subsystem. We do not deal with the problem of efficient datatype-tree 
processing here, but abstract storage and processing costs with a simple, parameterized cost 
model, which must be adapted to the concrete situation. The literature on optimization of 
the processing of tree representations of data layouts in MPI is large; some pointers are given 
in [T3]. 

The paper is structured as follows. We define the set of considered constructors and precisely 
formulate the type reconstruction problem in Section [21 Our main result is given in Section O 
which describes our dynamic programming algorithm, proves correctness and establishes the 
complexity bound. In Section 0] we discuss how our approach can be extended to include other 
convenient and in specific situations more concise constructors, and how the problem changes 
when trees can be folded into DAGs. Goncluding remarks, including a discussion of relevant 
future work in this area are given in Section [5l 

2 The type reconstruction problem 

A data layout is an ordered sequence of relative (integer) displacements, each indexing a certain 
base data type (integer, char, floating point number) relative to some base address. Since 
the semantics of base-types will not be important for the following, we abstract the problem to 
consider from here onward displacement sequenees which we write as D = {do, di,..., dn-i) with 
the displacements D[i\ = di being indexed from 0 to n — 1. We point out that the complexity 
of the problems that we investigate does not change by considering full type maps consisting 
of sequences of displacements with their associated basetype (and number of bytes occupied), 
as would have to be done in a concrete implementation of our algorithms for real libraries, 
although of course the structure of the reconstructed types may look different. A segment of an 
n-element displacement sequence from index i to index j is denoted hy D[i,j] = {di, dj+i,..., dj), 
0 < i < j < n. A prefix of length c is the segment D[0, c — 1]. The displacements of the sequence 
are arbitrary (non-negative, negative) integers, and the same displacement can appear more than 
once (although this will normally not be the case, and is often disallowed, e.g., for some uses of 
derived data types in MPI). Thinking of displacements as (Byte) addresses, it is clear that any 
application data layout can be described by a displacement sequence. The ordering constraint 
(displacement sequence, not displacement set) implies that data are accessed in a specific order. 
This is often important for data layouts used in communication operations. 

Displacement sequences typically contain regularities and some form of structure, since they 
can be thought of as arising from a specific application, and this can be exploited to obtain 
more concise descriptions. We do this by type trees, where interior constructor nodes describe 
some ordered catenation of the layout(s) described by the child(ren) node(s). It is natural to 
ask for an efficient, polynomial time algorithm for computing the most concise and efficient 
representation for a given set of constructors and cost model. 

We consider the following set of constructors that subsume constructors found in G-like 
programming languages, as well as the derived data type constructors found in MPI: 
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strc(2, (0,60)) 



con(5) vec(5,—10) 

idx(3, (0,-4, 7)) 
con(l) 

Figure 1: Type tree representing the displacement sequence D = (0, 1, 2, 3, 4, 60, 56, 67, 50, 
46, 57, 40, 36, 47, 30, 26, 37, 20, 16, 27). Note that if the strc constructor is not allowed, the 
only way to represent this displacement sequence is the trivial representation idx(20, H, con(l)). 

Definition 1 (Basic type constrnctors) A basic tree may be constructed from the following 
four basic constructors.' 

1. A leaf con(c) with count c describes a sequence of c adjacent relative displacements 
0,l,2,...,c-l. 

2. A (homogeneous) vector vec(c, d, C) with count c and stride d describes the catenation of 
c sequences C at relative displacements 0, d, 2d ,... , (c — l)d. 

3. A (homogeneous) index idx(c, {io,ii,..., ic-i),C) with count c and indices (zq, ii,..., ic-i) 
describes the catenation of c sequences C at relative displacements io,ii,... ,ic-i- 

4- A heterogeneous index, or struct, strc(c, (io, • • •, ic-i), {Co,Ci ,..., Cc-i)), with count 
c and indices (io, ii,..., ic-i) describes the catenation of c sequences Cq,Ci, ... ,Cc-i at 
relative displacements io,ii,..., ic-i- 

For example, the displacement sequence (3,5, 7,9,11) can be described by idx(l, (3), vec(5, 2, 
con(l))). A more involved example is shown in Figured] Note that any displacement sequence 
D of length n can trivially be represented as idx(n, D, con(l)). 

We refer to vertices of type trees as nodes, where each node is one of the constructors. 

It can easily be shown that each of the MPI derived data type constructors (for contiguous, 
vector, index, and structured subtrees) [U Chapter 4] is expressible by the basic constructors 
of Definition dl and that the mapping is almost one-to-one. For instance, the MPI_Type_vector 
constructor denotes a layout consisting of a strided sequence of blocks, each being a strided 
sequence of some type B. This is expressed as vec(c, s, vec(5, e, B)) where c is the number of 
blocks, s their stride, b the number of elements in each block, and e the stride used within each 
block. We treat base types as sequences of bytes which can be expressed by leaf nodes, e.g., 
a 32-bit entity like int would be expressed by con(4). The idx constructor makes it possible 
to express the repetition of the same layout B each at some arbitrary displacement; for this 
only the sequence of start indices (and the size of this sequence) needs to be represented. The 
most expressive, arbitrary branching constructor strc can express the catenation of a sequence 
of possibly different, smaller layouts each starting at an arbitrary displacement. This is the only 
constructor node with arity greater than one. In contrast to the similar MPI constructor MPI_- 
Type_create_struct, which also takes a repetition count (blocklength) for each substructure, 
the strc constructor saves this extra sequence. If a substructure is indeed a repetition of some 
even smaller substructure, this information is part of the substructure and not of the strc node 
itself. The basic constructors increase in generality and storage cost: an idx node is a strc node 
where all substructures are similar, and therefore does not need to store a sequence of subtypes; 
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a vec node is an idx node with regularly strided displacements, which can be computed from a 
single scalar instead of storing an explicit index sequence. As the example in Figure [T] shows, 
the strc constructor makes unbounded compression possible over the idx constructor. 

To make it possible to express further common patterns without redundancy, we also con¬ 
sider a few auxiliary constructors. The patterns that these constructors capture can all be 
expressed by two-level nestings of basic constructors, but possibly at a higher cost. For practi¬ 
cal purposes and depending on the application usage patters that are intended to be supported, 
it might therefore make sense to have a richer set of constructors. For instance, MPI has both 
an MPI_Type_create_indexed_block (which is captured by the idx basic constructor node) and 
an MPI_Type_indexed constructor which stores also a repetition count for each index. In cases 
where all substructures are repeated the same number of times, this is strictly redundant, and 
there are therefore use cases for both constructors. We include the auxiliary constructors to 
argue informally that our algorithm can handle a large set of reasonable constructors. 


Definition 2 (Auxiliary type constructors) An extended tree may contain also the follow¬ 
ing two auxiliary constructors.' 

1. A strided bucket, vecbuc(c, d, e, ( 60 , 61 ,..., 6c-i), C) with count c and strides d, e describes 
the catenation of c sequences at relative displacements 0,d,2d,... (c — l)d. The i-th se¬ 
quence is the catenation of bi sequences C at relative displaeements 0, e, 2e ,... {hi — l)e. 

2. An indexed bucket, idxbuc(c, e, (io,T) ■ ■ ■ ic-i)-, ( 60 , 61 ,... 6 c-i),C'), with count c and sub¬ 
stride e describes the catenation of c sequences at relative indices io,T, ■ ■ - ic-i- The i-th 
sequence is the catenation of bi sequences C at relative displacements 0, e, 2e ,.. .{hi — l)e. 

As can be seen from the discussion above, the indexed bucket constructor corresponds to 
the MPI_Type_indexed constructor. There is no MPI counterpart of the other, arguably natural 
constructor. We discuss these constructors in more detail in Section KT[ 

Each basic or extended tree represents one displacement sequence, obtained by an ordered 
traversal of the nodes of the type tree. This process is called flattening and is captured by 
the algorithm in Listing [1] for the basic constructors; the auxiliary constructors can be handled 
similarly. The converse is not true: a displacement sequence will almost always have several 
possible type tree representations. 

We make no claim that Listing [1] depicts a particularly good way of implementing flatten¬ 
ing M- Note that the size of the displacement sequence described by a type tree T could be 
much larger than the number of nodes in T. Within this paper, we assume that all numbers 
can be represented by a constant number of bits; otherwise, our main result still holds, but the 
upper bound on space requirements increases by a logarithmic factor. 

By the conciseness of a type tree we mean the space taken by the representation. This is 
constant for vector and leaf nodes and proportional to the size of the index and type sequences 
for the other constructors. Processing costs are related to conciseness: the concise vector 
constructor that describes a strided repetition of a sub-pattern can often be handled by strided 
memory-copy or strided communication operations, whereas constructors with sequences of 
displacements or types need at least a traversal of the corresponding sequences and typically 
entails a more irregular and expensive access to memory. We will therefore first focus on a 
simple cost model for optimizing conciseness. 

The cost of a type node shall be proportional to the number of words that must be stored 
to process the node. This includes the node type (con, vec, idx, strc), count, displacement or 
pointer to index or type array, pointer to child node(s), and a lookup cost for the elements in 
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Listing 1: Flattening procedure defining the displacement sequence represented by a given 
basic tree T. The procedure is called with a base offset, which will normally be 0. The 
procedure can trivially be extended to also cover extended trees. 


1 

2 

3 

4 

5 


Function FlattenCT, base) 
switch T.nodetype do 
case con 

for i ^ 0; i < T.c; i++ do 
1^ print base + i 


/*■ leaf of consecutive indices */ 


6 

7 

8 


case vec 

for i ^ 0; i < T.c; i++ do 
1^ Flatten(T.stt6type, base + i ■ T.d) 


/* strided layout *■/ 


9 

10 

11 


case idx 

for i ^ 0; i < T.c; i++ do 
1^ Flatten(T.sM6type, base + T.D[i\) 


/* indexed layout *■/ 


12 

13 

14 


case strc /* indexed layout with subtypes *■/ 

for i ^ 0; i < T.c; i++ do 
1^ Flatten {T.subtypes\i], base + T.D\i]) 


lists of indices or types: 

cost (con (c)) 
cost(vec(c, d, C)) 
cost(idx(c, (.. ■},C)) 
cost(strc(c, 






K'\dx T cT"lookup 
Tlstrc T 2cTriookup 


The constants can be adjusted to reflect other overheads related to representing and processing 
a node. We define the cost of a type tree T to be the additive cost of its nodes Tp. cost(T) = 
EiCost(Ti). 


Listing 2: A possible Typenode structure for representing nodes in type trees or DAGs. 

1 struct { 


2 

enum nodetype = {con, vec, idx, strc} 


3 

int c 

/* count */ 

4 

int d 

/* stride */ 

5 

int D[] 

/* displacement of subtypes */ 

6 

Typenode subtype 

/* subtype */ 

7 

Typenode subtypes[ ] 

/* array of subtypes */ 

8 } 

Typenode 



For the examples given in this paper, we take iFcon = Ky&c = K\dx = Kstrc and iFiookup = 
1. For instance, with a C-style structure as shown in Listing [2] to represent any of the type 
constructors, all constructors indeed have the same constant in the cost (which we could take 
as 6 units). We remark that our algorithm is not dependent on the specific choice of the cost 
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function, and that onr results also hold for other reasonable cost fnnctions where the cost of a 
node is a function of the node itself and the costs of its children. 

We can now formally define the problem that we will solve in the next section. Recall that 
a type tree T represents a displacement sequence D if Flatten(T, 0) = D. 


Basic Type Reconstruction Problem 
Instance: A displacement sequence D of length n. 

Task: Find a least-cost (or optimal) basic tree T representing D; that is, cost(T) < 
cost(r') for any basic tree T' representing D. 


3 Basic tree reconstruction in polynomial time 

We now present our main result, namely that the Basic Type Reconstruction Problem 
can be solved in polynomial time, snbseqnently show that extending the set of the anxiliary 
constrnctors of Definition [2j 

Theorem 1 For any input displacement sequence D of length n, the Basic Type Recon¬ 
struction Problem can be solved in O(n^) time and O(n^) space. 

Proof outline: We first give a characterization of the structure of optimal basic trees 
(Lemma d]) which allows for a simple and elegant procednre to solve the special case of displace¬ 
ment sequences in normal form (Definition [6]) . 

The fundamental observation for the proof is that any (non-trivial) displacement sequence 
can be described by either a catenation of the same kind of shorter displacement sequences 
(and thus by either a vector or an index constrnctor) or by a catenation of different, bnt shorter 
displacement sequences (and thus by a struct constructor). In both cases, for an optimal de¬ 
scription, the description of the shorter sequences must likewise be optimal, and the principle of 
optimality applies. This intuition is formalized in Lemma [2] and Lemma El Lemma 0] proves the 
claim for the special case of displacement sequences in normal form, with a detailed procedure 
given in Listing El 

Finally, LemmaOshows how to construct an optimal basic tree for any displacement sequence 
out of an optimal basic tree representation of its normal form. 

Definition 3 (Repetition, Strided Repetition) A repetition in a displacement sequence D 
of length n is a prefix C = D[0,q — 1] of length q s.t. q is a divisor of n and for all i,j, 
1 < i < n/q, 0 < j < g we have that D[j] — D[0] = D[iq + j] — D\iq\. A strided repetition of 
length q additionally fulfills D[{i + l)g] — D[iq] = D[q] — D[0] for all i, 0 < i < n/q — 1, where 
d = D[q] — D[0] is the stride of the repetition. 

The intention of the functions Repeated and Strided (see Listing E]) is to hnd (strided) 
repetitions C of a displacement sequence D that can be exploited to represent D via an idx or 
vec constructor with subsequence C. It is easy to see that Repeated and Strided as outlined 
both take linear time. 

As mentioned above, any displacement sequence D can be described by either a catenation 
of the same kind of shorter displacement sequences or by a catenation of different, but shorter 
displacement sequences. Additionally, a representation via a con node is possible if D is a trivial 
displacement sequence (0,1,... , n — 1). In terms of type trees, this means that an optimal basic 
tree T for a displacement sequence D is either 
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Listing 3: Trivial checks for repetitions and strided repetitions. 


1 

2 

3 

4 

5 


Function Repeated(D, n, q) 

for i ^ q; i < n; i ^ i + q do 
for i ^ 1; j < q; j ^ j + ldo 

if D[j] — Z)[0] 7 ^ D[i + j] — D[i\ then 
1 ^ return false 
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return true 


7 

8 
9 

10 


Function Strided (D, n) 

d^D[l]-D[d\ 
for i 1; i < n; i ■(— i + 1 do 
1^ if D[i] — D[i — 1] ^ d then return false 
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return true 


1 . r = con(n), a single con node with count n; or 

2. T = vec(c, d, S), where the prefix L)[0, (7 — 1] of length q = n/c is a strided repetition in D 

with stride d and S is an optimal basic tree for the prefix (Z)[0],... , — 1]); or 

3. T = idx(c, {io ,..., ic-i), S), where the prefix D[0, g — 1] of length q = n/c is a repetition 
in D, S is an optimal basic tree for the sequence (Z?[0] — zq, ... ,D[q — 1] — fo) and the 
indices zq, ..., Zc-i are such that Flatten(T, 0) = D] or 

4. T = strc(c, (zo, • • •, ic-i), {So, ■ ■ ■, Sc-i)), where the Sj for 0 < j < c are optimal basic 
trees for some sequences Cj which together with the indices zo,...,Zc-i are such that 
Flatten(T, 0) = D. 

While the first case can be handled with a single scan of D, the others are more involved. In 
the following, we give a more detailed characterization of (optimal) basic trees to tackle the 
problem. 

Definition 4 (Shifted node) We call an index node idx(c, (zq, • • •), C) or a struct node strc(c, 
(zo,...), (...)) with Zo 7 ^ 0 a shifted node; s = zq zs called the node’s shift. 

Note that adding some value s to all indices of an idx or strc node N shifts the sequence 
represented by the basic tree rooted at by s. 

Definition 5 (Nice basic tree) A nice basic tree contains at most one shifted node, which is 
the first idx or strc node on every root to leaf path. 

Lemma 1 For any basic tree T representing a displacement sequence D, a nice basic tree 
representation T of D of equal cost exists. 

Proof: A node is bad if it is a shifted node and it is not the first idx or strc node on every 
root to leaf path. Let D be a fixed displacement sequence and let T be a basic tree representing 
D with a minimum number of bad nodes. We will show that T is, in fact, nice. 

Assume that a bad index node (the proof is analogous for a bad struct node) Nj = idx(c, 
(zo, • • •, Zc-i),...) is present in the k-ih. subtree of a struct node Ns = strc(c', (zq, ..., z),,..., 
Zc/_i), (• • •)) s.t. there is no other shifted node on the path from Nj to Ns. We can change Nj 





to a non-shifted index node by subtracting its shift s = io from all indices ij, for 0 < j < c and 
adding s to the k-th index of Ns, i.e., Nj = idx(c, (0, ii —s,..., ic-i — s),...) and Ns = strc(c', 
{Iq, ... ,i'j^ + s, .. ., {■ ■ •))• Notice that the basic tree obtained in this way still represents 

the same displacement sequence D but contains one less bad node, and hence the existence of 
such a node Nj would contradict our choice of T. 

Hence there is no strc node on the path from a bad node Nj to the root node R. If this 
path contains an index node Nj ^ Nj, proceed analogously to the previous case: Nj = idx(c, 
(0, zi — s,, ic-i — s),...) and N'j = idx(c', (zq + s, ..., + s),...). Again, the obtained 

basic tree also represents D but contains one less bad node, contradicting our original choice of 
T. Consequently, T does not contain any bad nodes and thus must be a nice basic tree. □ 


Corollary 1 Any optimal basic tree T contains at most one index node with count 1, i.e., at 
most one node of the form N = idx(l, (zq), ...). Additionally, there is no other idx or strc node 
on the path from N to the root. 

Proof: Assume that T contains two index nodes with count 1. Since T is a tree, there is 
an index node N with count 1 s.t. the path from N to the root node of T contains another 
idx or strc node. In a cost-equivalent nice basic tree representation T (obtained by applying 
the procedure from the proof of Lemma [1]), the corresponding index node is = idx(l, (0), T'). 
Note that the type tree rooted at N represents exactly the same displacement sequence as its 
subtype T'. Thus a representation T' of less cost exists, which contradicts the assumption that 
T is optimal. □ 

The following proposition, although not directly required for the analysis, provides some 
additional insight into the structure of optimal basic trees. 

Proposition 1 The height of an optimal basic tree is O(logn). 

Proof: It is easy to see that an optimal basic tree does not contain two consecutive strc 

nodes, as they can always be merged into one while reducing the cost. For any basic tree T 
that represents a sequence of length n, a basic tree idx(c, (. . .),T) or vec(c, ... ,T) with c > 2 
represents a sequence of length at least 2n. Let P be a maximum-length path from a leaf to the 
root of an arbitrary optimal basic tree. Since any optimal basic tree contains at most one idx 
node with count c = 1 (Corollary [T]) and no vec node with c = 1, the length of the represented 
sequence at least doubles with at least every other node on P. □ 


Definition 6 The normal form D of a displacement sequence D of length n is defined as 
D[i] = D[i] — D[0], for all i, 0 < i < n. 

In other words, the normal form l) of a displacement sequence D is obtained by shifting D so 
that its first element is 0. 

Corollary 2 An optimal basic tree T for a displacement sequence D in normal form does not 
contain any shifted nodes or any idx, vec or strc node with count 1. 

Proof: It follows directly from Lemma [1] and Corollary [1] that there exists an optimal basic 
tree T for D which does not contain any shifted nodes. Note that a non-shifted idx, vec or 
strc node with count 1 does not change the represented sequence. Thus, removing such nodes 
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from a basic tree reduces the cost while not changing the represented displacement sequence. 
It follows that no such node can be part of an optimal basic tree. □ 

Observe that since there are no shifted nodes in an optimal basic tree T for D, any subtree 
of T represents a segment of D in normal form. In the following, we will use Tij to denote an 
optimal basic tree representation for the normalized segment D[i,j] of D. 

For convenience, we define the function Min(5, T) which, given two basic trees S and T, 
returns the one with least cost (if either is null, the other is returned). Note that the cost of 
a basic tree can trivially be computed by a simple traversal. However, when constructing basic 
trees from the bottom up (as we will do in this section), we keep for each node the cost of the 
subtree rooted at that node. This allows for the cost of a basic tree to be queried in constant 
time and thus for a constant-time implementation of Min. 


Listing 4: Algorithm to find a least-cost representation for a displacement sequence in 
normal form with an idx or vec node as root node. 


1 Function Repetition(H, re) 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 


Tr •(— null 

foreach divisor q of n, q < n do 
njq 

if Repeated (Z), re, q') then 
for i = 0; i < c; i++ do 
|_ I[i] ^ D[iq] 

T^idx ^ idx(c,/, Tg ) 

Tr = VilniTidx, Tr) 
if Stridedd, c) then 
d ^ /[I] - /[O] 

Tree ^ vec(c, d, Tq g—i) 
Tr ^ MinCr^jec, Tr) 


return Tr 


Lemma 2 Let D he any displacement sequence of length re in normal form and assume that 
optimal basic tree representations for all normal form prefixes of length less than or equal to 
[re/2j are known. A representation Tr, where the root node of Tr is either an idx or a vec 
node and Tr is of least cost w.r.t. all possible representations of that form, can be computed in 
0{ny/n) time. 

Proof: Listing S] enumerates all possible representations of the desired form and chooses the 
one with least cost among them. Note that for the divisor q = 1, the trivial representation 
idx(re, I), con(l)) (which exists for any displacement sequence D), is generated and thus a valid 
representation for D is guaranteed to be found. For the same reasons as given in Corollary [2l 
idx nodes with count 1 cannot be part of a least-cost representation of the desired form and 
thus need not be considered. 

The number of divisors of re is upper-bounded by 2 [y^J and, by assumption, optimal rep¬ 
resentations for all prefixes of D of length less than or equal to [re/2j are known, i.e., Tqj is 
known for all j, O < j < nl2. This implies the claimed runtime bound. □ 
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Lemma 3 Let D be any displacement sequence of length n in normal form and assume that 
optimal basic tree representations are known for all normal form segments of length strictly less 
than n. A representation Tg, where the root node of Tg is a strc node and Tg is of least cost 
w.r.t. to all possible representations of that form, can be computed in 0{n^) time. 

Proof: Construct a weighted, directed acyclic graph G = {V,E,w) with V = {no,...,nn}; 
E = {{vi,Vj) I 0 < i < j < n, j — i < n} and the weight function w which is defined for 
all edges {vi,Vj) in E as w{vi,Vj) = 2ii'iookup + cost(rij_i). The intended meaning of this 
construction is as follows. A node Vi corresponds to the f-th element of D (vn is a special vertex 
that corresponds to the hypothetical first element after the end of D) and an edge {vi,Vj) with 
i < j corresponds to the segment D[i,j — 1] in normal form. The weight of an edge {vi,Vj) is 
equal to the cost of the optimal representation Tij-i of the segment D[i,j — 1] (which exists 
by the assumption) plus a cost of 2 Kiookup for including this representation as a subtype in a 
strc node. The edge (no,n„), which is not part of the constructed graph, can be thought of as 
corresponding to the type tree To^„_i, i.e., the optimal type tree representation of D we want 
to compute. 

Let P = (no, ui,..., Uk, Vn) be a shortest path in G from no to n„ with Uj G P for 1 < i < /c. 
Then the basic tree strc(A; + 1 , (D[0], D[ui],... ,D[uk]), {Tq^ui-i,Tui,u 2 -i, ■ ■ ■Tuk,n-i)) is a valid 
representation of D. Note that by construction, for any valid representation of E of the desired 
form, a corresponding path from vq to Vn exists in G and thus a shortest path represents the 
desired solution of least cost. Given P, this representation can be constructed in linear time, 
since optimal representations for all required segments are known by the assumption. The re¬ 
sulting graph has ( 2 ) edges and the runtime is dominated by the cost of O(n^) time for finding 
a shortest path in a DAG. □ 

We can now give the complete dynamic programming algorithm for constructing optimal 
basic trees for displacement sequences in normal form, which proves LemmalU Due to LemmalU 
it suffices to construct an optimal nice basic tree which according to Corollary [2] cannot contain 
any shifted nodes nor any idx, vec or strc nodes with count 1. The algorithm is shown in 
Listing [5l 

Lemma 4 For any input displacement sequence D of length n in normal form, the Basic Type 
Reconstruction Problem can be solved in 0(n*) time and 0{n^) space. 

Proof: The input to the algorithm is an n-element displacement sequence D in normal 

form. The algorithm computes an optimal basic tree T[i,j] for each normalized segment D[i,j], 
0 < i < j < n, which is stored with edge (i,j -|- 1) in the constructed graph G. Note that the 
solution for the whole input sequence D can be read off of the edge (no,n„). 

The algorithm starts with a preprocessing step to find all segments whose normal form is 
representable with a single con node. Note that the normal form of any segment of length 1 
can trivially be represented as con(l) and since no other valid representations exist for this 
particular kind of displacement sequence, this representation is optimal. A straight forward 
implementation of this preprocessing step as in Listing [5] is clearly feasible in time O(n^). 

The algorithm computes optimal basic tree representations for all normalized segments of 
D, via a bottom up dynamic programming approach. The dynamic programming table to 
be filled in is implicit in the graph G, where each segment D[i,j] is associated with an edge 
{vi,Vj+i). Note that after the preprocessing step, solutions for all segments of length 1 are 
known. By incrementally computing optimal representations for all segments of length 2,..., n, 
it is ensured that Lemmas [2] and [3] can be applied to compute an optimal representation for 
each segment as follows. A basic tree T^, whose root node is either an idx or a vec node, and 
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Listing 5: Algorithm to find a least-cost basic tree representation. 


1 Function Typetree (l), n) 


2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 

27 

28 


29 


/* Initialization 

G = ({to,... ,tn},0) 

/* Preprocessing: find leaf nodes 

for z •(— 0; i < n; i++ do 
J ^ i 

do 

Tij con(j - i + 1) 

Wij -(—2-1- cost{Tij) 

Add edge {vi,Vj+i) with basic tree Tjj and weight Wij to G 
j ^j + 1 

while j < n and D[j] — D[j — 1] == 1 

/* Find solutions for all segments 

for I <— 2; I < n; 1++ do 

for z •(— 0; i <n — 1; i++ do 

/* Compute optimal basic tree for normalized segment D[i,i + l — V\ 
*/ 

j ^i + l-I 

/* Find best representation with idx or vec node as root 
Let Di^j be the normalized segment D\iG] 

Tj. •(— Repetition(L)jj, I, z) 

Tij ^ KiiiiTr, Tij) 

/* Find best representation with strc node as root 
Find shortest path P from Vi to Vj -|- 1 in G 
Assume P = {vi,ui,... Uk, Vj) 

I (0,.D[ui] - b[vi],.. .,b[uk] - P>[vi]) 


*/ 

*/ 


*/ 


*/ 


*/ 


2-1,... ,r, 


— 1 / 


strc(A: + 1,1, subtypes) 


subtypes i 
Tij ^ MinCTs, Tjj) 

Add edge (ujjUj+i) with representation Tjj and weight ATiookup + costiTij) to 

G 


return To^n-i 


/* Stored with edge {vo,Vn) */ 


a basic tree Tg, whose root node is a strc node, are computed. Both are of least cost w.r.t. 
all basic tree representations of the desired form. The optimal basic tree for a normalized 
segment D[i,i + I — 1] is necessarily one of Tr, Tg or a representation via a con node (if such a 
representation is possible), which was already computed in the preprocessing step. 

To compute Tj., a small, technical extension of procedure Repetition (Listing^]) for finding 
representations via idx or vec nodes is necessary. The procedure requires access to optimal 
representations of the prefixes of the argument displacement sequence D. However, in the 
general case, D is a segment of D, that is, D = D[i,j], and its prehxes therefore start with D[i]. 
To account for this (and avoid copying D[i,j]), we pass an additional argument o representing 
the offset of the segment within the input displacement sequence D (i.e., for a segment D[i,j], 
we have o = i), and in lines 10 and 12 replace the argument To^q^i with To^o+q-i- 

To compute Tg in Listing [5l contrary to Lemma [3l we do not construct a new graph for each 
segment when computing its representation Tg. Instead a single dynamic, incrementally built 
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graph G suffices to solve the problem for all segments of D. By construction, when computing 
the desired representation of a segment D[i,i + / — 1], G contains edges representing optimal 
representations for all segments of length less than I (and possibly some edges representing 
solutions of length 1). A shortest path from node Vi to in G therefore leads to the same 
representation as the one constructed by Lemma [3l 

To find such a shortest path, for each segment D[i,i + / — 1] of length I, one single-source 
shortest path (SSSP) problem on a weighted DAG with I + 1 nodes and edges has to be 

solved. Since G is a topologically sorted DAG by construction, SSSP is solvable in 0(|P| + |L^|) 
time, where |P| denotes the number of vertices and \E\ denotes the number of edges in G [2]. 
To compute the desired representations for all segments of length I, a shortest path has to be 
computed for each of the n -|- 1 — / node pairs {vi,Vi+i), forO<i<n + l — L The total runtime 
is thus upper bounded by + 1 “ 0) which is 0{ri^). 

The algorithm constructs a graph with 0{ri?‘) edges, where a basic tree Tij, representing the 
solution for the normalized segment D[i,j], is associated with each edge {vi,Vj). Note that for 
each edge {vi,Vj) it suffices to store the root node of the associated basic tree Tjj plus pointers 
to its child nodes, which are already stored with the respective edges. To meet the desired 
space bound, only a constant amount of space may be used by each edge and associated basic 
tree. This is trivially true for con nodes (apart from one word indicating the node’s kind and 
the cost of the type tree rooted at the node, only the count c needs to be stored) as well as 
vec nodes (two integer values and one pointer to the child node are required in addition to the 
node’s kind and the cost of the type tree rooted at this node). However, idx and strc nodes may 
require D(n) space in the worst case (e.g., if idx(n, D, con(l)) is the optimal representation of 
D). We employ a standard trick often used in dynamic programming algorithms and store for 
each node only the information required to reconstruct the full solution once the algorithm in 
Listing 0 has terminated. If for an idx node the count c is known, the full idx node is easily 
derived as idx(c, (D[0], D[g],... , D[(c — l)g]), Tq^^-i) with q = n/c. The parameters of a strc 
node associated with an edge {vi,Vj) can be reconstructed by again computing the shortest 
path from node Vi to Vj and mapping it to a strc node as done in Lemma [3l Note that this 
reconstruction step does not change the asymptotic runtime bound and that the required space 
for each node is 0(1), from which the claimed upper bound of O(n^) space follows directly. □ 

The following Gorollary [3] and Lemma [5] show how the algorithm of Lemma 0] can be applied 
to general displacement sequences. 

Corollary 3 For any optimal basic tree with an index node N with count 1, i.e., a node N = 
idx(l, (io),...), a representation T' of equal cost s.t. N is the root node of T', exists. 

Proof: Due to Corollary [H there is no idx or strc node on the path from N to the root and 
thus N shifts the whole sequence by zq- This shift can be represented equivalently by remov¬ 
ing N from the basic tree and adding a new root node to represent the shift, i.e., by letting 

r' = idx(i,(io),r\iv). □ 


Lemma 5 Given optimal basic trees Tij for all normalized segments D[i,j] of a displacement 
sequence D, an optimal basic tree T representing D can be computed in 0{v?) time and 0{n) 
space. 

Proof: By Lemma [H for any optimal basic tree T a cost-equivalent nice basic tree T rep¬ 
resenting the same displacement sequence D exists and it therefore suffices to find an optimal 
nice basic tree representation T for D. By assumption, an optimal nice basic tree representa¬ 
tion T = ro,n-i for the normalized sequence D exists. To construct T, find the first node N 
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on any root to leaf path in T that is either an idx or a strc node and add the displacement 
sequence’s shift s = L>[0] to the indices of this node, i.e., if = idx(c, (io, • • •, ic-i),T') in T, set 
N = idx(c, (io + s, • • •, ic-i + s),T') in T and analogously for the case of N being a strc node. 
Note that T has the same cost as T and thus is an optimal basic tree representation for D. 

If such a node does not exist, it follows from Lemma [1] and Corollary [3] that the optimal 
solution is either 

• T = idx(c, (.. •),Lo „/c-i)) for some divisor c of n, or 

• T = strc(c, (...), (Tq) • • •) Tc-i)), for some c, 1 < c < n. 

Note that for idx nodes, both the trivial representation idx(n, D, con(l)) as well as the represen¬ 
tation idx(l, {D[0]),T which only adds a shifted node to T need to be checked. Since solutions 
for all normalized segments are already known, this construction is feasible in O(n^) time and 
0{n) space. □ 

Proof: [of Theorem [T] The Basic Type Reconstruction Problem for a displacement 
sequence D of length n can be solved by computing an optimal basic tree representation for the 
normalized displacement sequence D (LemmaH]) and the post-processing step given in Lemma[5j 
The claimed space and time bounds follow directly from the given Lemmas. □ 


4 Computing more concise representations 

In this section we discuss possibly more space efficient tree representations by allowing a richer 
set of constructors, exemplified by the auxiliary constructors introduced in Definition [2l We 
then explain why computing representations by DAGs is an apparently harder problem. Finally, 
we discuss the applicability of our algorithms to the type normalization problem. 

4.1 Handling the auxiliary constructors 

The auxiliary constructors of Definition [2] can be handled by slight extensions to our algorithm 
in a way that polynomial-time type reconstruction is still possible. Basically, only the part that 
checks for vector or index patterns shown in Listing 0] needs to be extended. Assume that a 
repeated prefix C of length q has been found in the given displacement sequence D, and that D' is 
the displacement sequence consisting of every gth element of D, i.e., D' = [D[0], D[q\, D[2g],...]. 

The strided bucket, vecbuc(c, d, e, {bo, bi,..., bc-i), C) constructor can concisely describe ap¬ 
plication data layouts consisting of buckets each with some maximum number of elements (the 
stride d) where each bucket contains some (possibly different) number of elements bt with bucket 
stride e. This description is likely to be less costly than describing such a layout by a strc con¬ 
structor with each subtype describing one bucket. To incorporate the strided bucket it simply 
has to be checked in Listing 0] whether D' follows the strided bucket pattern, and this can easily 
be done in linear time. There are two cases to consider. If the first bucket has more than one 
element, take as bucket stride e = D'[V\ — D'[0] and scan the index list for repetitions at stride 
e. The first violation at some position i forces the maximum bucket size to be d = D'[i] — D'[0]. 
Now continue to scan till the end of D', checking that the e, d strided pattern repeats and 
counting the number of elements bi in each bucket of e-strided displacements. Otherwise, the 
first bucket has only one element. Take instead as maximum bucket size d = D'\i] — D'[0], and 
scan for repetitions with stride d. The first violation at some position i forces the bucket stride 
to be e = D'[{\ — D'[i — 1]. As in the other case, the bucket sizes bi are counted by scanning 
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D' till the end. If an index i is found where D'\i] — D'\i — 1] 7^ e and D'[i] — D'\j] ^ d where 
j is the start of the current bucket in D', then D' is not a displacement sequence of a strided 
bucket layout. 

The strided bucket constructor is in a sense the opposite of the index constructor. Instead 
of an index sequence it takes a sequence of bucket sizes, and has (roughly) the same cost. 
Interestingly, there is no such constructor in the MPI standard. 

The indexed bucket, idxbuc(c, e, • • -ic-i)-, on the other hand corre¬ 

sponds closely to the MPI_Type_indexed constructor. For each index, a repetition count hi gives 
the number of repeats of C in the bucket starting at that index; all repetitions use the same 
stride e (the constructor could trivially be extended to the case where each index has its own 
stride). For each possible bucket stride, the number of buckets that this stride will give rise to 
has to be counted. The stride e leading to a smallest number of buckets is a candidate for the 
representation of D' and C as an idxbuc node. We observe that each i with D'[i + 1] — D'[i] = e 
joins two e-strided segments D'[j, i\ and D'\i + 1, k] into one bucket starting at index j. There¬ 
fore, the stride that occurs most often in the stride sequence S[i] = D'[i+1] — D'[i], 0 < i < n—1, 
will lead to the smallest number of buckets. To count the number of occurrences of each stride, 
we either sort S or count by hashing during the scan of D'. Let e be a stride with the most 
occurrences. A final scan of D' suffices to compute the start indices and sizes of the buckets 
with stride e. 

4.2 Type reconstruction into DAGs 

A type tree describing some given displacement sequence may have multiple instances of the 
same subtree. Our algorithm in particular constructs nice type trees (Definition [5]) in which 
all displacement sequences in index and struct nodes except perhaps one start at index 0, and 
it can well happen that the same index or struct node occurs many times. A more concise 
representation results if such trees are folded into directed acyclic graphs with only one node 
for each substructure. 

Type DAGs represent displacement sequences by the same flattening procedure as shown 
in Listing [T] for trees. Each path from the root node in the type DAG to a leaf is traversed in 
order to generate the corresponding displacement sequence. Thus the processing cost of a type 
DAG would arguably be similar to the processing costs of a tree. By a similar traversal of a 
DAG an equivalent tree can be constructed, simply by making a new copy each time a node is 
visited. 

The space required for the DAG can be much smaller than the space required for the 
corresponding tree. One can therefore define also for DAGs our cost model for optimizing 
conciseness as the additive cost of the nodes in the DAG; and not as the sum of the costs of 
all paths traversed. The type reconstruction problem into DAGs is now to find the least-cost 
DAG representing the given displacement sequence. 

One crucial difficulty which arises when dealing with such type DAGs is that the best 
representation for a subsequence no longer needs to be locally optimal, since costs savings can 
be achieved by reusing other nodes of the DAG. This is illustrated in Figure [2j 

In particular, this implies that the type tree constructed by unfolding a cost-optimal DAG 
is not necessarily a cost-optimal tree, and conversely, that the DAG obtained by folding a given, 
cost-optimal type tree is not necessarily a cost-optimal DAG. This constitutes a fundamental 
problem for our general approach for handling type trees, and new ideas are needed to solve 
the type reconstruction problem into DAGs. 
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strc(3, (0,110,130)) 



strc(2, (0,5)) con(l) idx(20,X) 
con(3) idx(20,X) con(3) 

con(3) 

Figure 2: An unfolding of an optimal type DAG representing a displacement sequence D; X is 
an arbitrary subsequence of length 20 over 0,1,..., 99. The subtrees rooted at idx(20, X) only 
contribute to the cost function once. Notice that the subtree rooted at strc(2, (0,5)) is not a 
least-cost type tree representation of the represented subsequence. 

4.3 The type normalization problem 

The type normalization problem subsumes the type reconstruction problem that we have con¬ 
sidered so far. Type normalization asks to improve the cost of an already given tree description 
of the data layout. Since any data layout can be represented as a single idx node with the whole 
displacement sequence as index sequence, type normalization includes type reconstruction as 
a special case. Type normalization is the problem that compiler or library implementors are 
typically faced with: application data structures described as trees are given by the program¬ 
mer as part of the code, and an internal, optimal representation is to be constructed by the 
programming system. 

The trivial solution is to flatten the given type tree and apply the type reconstruction 
algorithm on the resulting displacement sequence. Since the size of the resulting displacement 
sequence is not bounded by the size or conciseness of the tree, this is highly undesirable. We 
would like a procedure where the complexity can be bounded by the conciseness of the type 
trees, specifically the total size of the index sequences in the tree. 

As shown in m, if the set of basic constructors is restricted to exclude the strc constructor, 
it is possible to perform type normalization by only rechecking optimality of the idx nodes. In 
this case, type normalization can be done in time proportional to the conciseness of the given 
tree. When the strc constructor is allowed, arbitrarily more concise representations can be 
possible as shown in Figure [Ij Optimality of a subtree that does not use the strc constructor 
does therefore not imply optimality when strc is allowed. It is therefore necessary to flatten the 
whole tree and apply the tree reconstruction algorithm on the resulting displacement sequence. 

5 Conclusion 

The main result of this paper is that the type reconstruction problem into trees is actually 
solvable in polynomial time. However, an O(n^) algorithm is not useful for larger values of 
n as might be the case in parallel applications where n could be proportional to the number 
of processors which in itself could be in the range of tens to hundreds of thousands. We 
note that our bottom-up dynamic programming algorithm performs a considerable amount of 
almost redundant checking for (strided) repetitions in displacement sequence segments. An 
asymptotically more efficient algorithm, perhaps based on a top-down approach, is likely to 
exist. Whether an exact, practically efficient algorithm for the full problem is possible, we do 
not know at the point of writing. 
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Restricting the power of the constructors can permit more efficient algorithms. As shown 
in m, if only corijVec and idx nodes are allowed, then the type reconstruction problem for 
a displacement sequence of length n can be solved in 0{ny/n) time. However, the resulting 
restricted trees can and often will be much more costly, as shown in Figure [TJ The high 
complexity of our algorithm is caused by the unbounded branching constructor strc node. A 
slightly better, O(n^) time algorithm would result from allowing only bounded branching, for 
instance a binary struct constructor that catenates only two subtrees. For such a constructor, 
the shortest path computation of Lemma [3] could be done in linear time. In some contexts, 
bounded branching might be sufficiently expressive. 

An alternative approach would be to look for low-complexity approximation algorithms with 
provable approximation guarantees. Or, even weaker, for heuristics that perhaps work well for 
the intended application cases. This reflects the state in current MPI libraries. 

As discussed, type trees can be represented more concisely as directed acyclic graphs (DAGs). 
To the best of our knowledge, it is still open whether a cost-optimal DAG representation for an 
arbitrary displacement sequence can likewise be constructed in polynomial time. 

A related problem to consider is the following. Given two displacement sequences of the same 
length, construct a least-cost tree (or DAG) representing a mapping between the two sequences. 
Such a tree (DAG) has uses when copying between different data layouts; this arises, e.g., in 
matrix transposition. In the MPI context this operation has been called transpacking mm- 
Our dynamic programming algorithm may extend to this case as well. 

Our work was specifically inspired by the derived data type mechanism of MPI. We believe 
that this idea is applicable in a much wider context of (parallel) programming interfaces and 
languages, and that the type normalization and reconstruction problems as dehned here, as 
well as the associated processing of data layouts represented by trees, have relevance extending 
beyond the motivating context. 
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